
Spider
A spider is a program that travels the WWW and compiles the information it
uncovers into a database. Spiders "crawl" across the Web by using the links in one html
document to access another document. The spider searches its database, rather than
querying the Web, each time it is asked a question. Not all spiders are created equal. It is
worth noting how large a spider's database is, how frequently the spider conducts
searches, and whether or not the spider is restricted to searching a fixed set of Web
servers.
Try a variety of different spiders in order to discover one that brings up more useful
results.
URLs:
- WWW
Tools and Guides Search Forms Search Engines Web Page Lists Posting
Services HTML Guides
- "A quick stop for links to search engines"
- Lycos
- Lycos is the deservedly popular Web search engine located at
Carnegie Mellon. The word "lycos" is taken from Lycosidae, the name for
a large ground spider which captures its prey rather than trapping it
in a Web. It is a favorite search engine of many because it includes
text from each hit, which allows the user to preview the usefulness of
the results. (Many search engines link directly to a URL, so each
result must be visited in order to see what is there.)
- Open Text
Index
- "Over 15 million links. All instantly searchable. All constantly
updated." Open text is the software which Yahoo uses.
W3E References:
- search engines
-
- WebCrawler
-
- worm
-
Print References:
- "Finding Needles in a Haystack--How to use search engines to
dredge up only the things you need to see" by Clay Shirky.
NetGuide issue 210, p87, October 1, 1995.
- "Protocol Gives Sites Way to Keep out the 'Bots' " by Jeremy Carl.
Web Week vol. 1, issue 7, November 1995.
- The World Wide Web Unleashed by John December and Neil
Randall. Sams Publishing, Indianapolis, IN. 1995.
ISBN: 0-6723-0737-5
Detail:
Some sites are using a robot exclusion protocol to exclude "robots".
Developed in 1994, this protocol is a file on the server
with a list of names of the user agents and the paths they use. The
file can exclude certain types of robots as well as exclude robots
from particular parts of a site. Sites that change frequently, such as
newspaper sites, or sites that cannot handle the traffic that robots
bring, have valid reasons for wishing to exclude these search
engines. Major search sites, such as Open Text and Lycos, also have an
interest in following the robot exclusion protocol, as some search
sites use these protocols themselves to protect portions of their own
sites from unwanted searches.

E-Mail:
The World Wide Web Encyclopedia at wwwe@tab.com
E-Mail: Charles River Media at chrivmedia@aol.com
Copyright 1996 Charles River Media. All rights reserved.
Text - Copyright © 1995, 1996 - James Michael Stewart & Ed Tittel.
Web Layout - Copyright © 1995, 1996 - LANWrights &
IMPACT Online.
Revised -- February 20th, 1996